WebAssembly Branch Hinting? #1

kripken · 2025-06-11T17:47:00Z

@dschuff @sbc100 This is an early beginning of an attempt to support branch hints in LLVM, but I honestly do not really know what I am doing 😄 Maybe you can tell me if I am even in the right direction.

Background: This is an example of an LLVM branch weights hint:

define i32 @branch_weights_test(i32 %a, i32 %b) {
  %1 = icmp ult i32 %a, %b
  br i1 %1, label %fail, label %success, !prof !0

fail:
  ret i32 -1

success:
  ret i32 0
}

!0 = !{!"branch_weights", i32 42, i32 1337}

That br has 42/(42+1337) (estimated) chance to go to the first target.

Reading stuff in the wasm backend and lld, it seems like there is a generic mechanism for generating custom sections in LLVM IR, compiling them, and linking them, so I was wondering if appending something like this could work:

!wasm.custom_sections = !{!1}
!1 = !{!"metadata.code.branch_hint", !"\01\02\03\FF"} ;; right section name, nonsense bytes

It looks like llc emits a custom section from that. So "all" I need is to replace the nonsense bytes with the proper info.

The proper info should somehow refer to the instruction. I am guessing a relocation is needed..? We'd need one both for the function and the offset of the instruction in the function. Is that possible, or should I be doing something entirely different?

(The code in this PR adds an IR pass to the wasm backend, and successfully reads branch_weights. It does not emit anything yet.)

This reverts commit 80210db.

This reverts commit 7294127.

This reverts commit 552d29c.

dschuff · 2025-06-11T18:46:12Z

The proper info should somehow refer to the instruction. I am guessing a relocation is needed..? We'd need one both for the function and the offset of the instruction in the function. Is that possible, or should I be doing something entirely different?

I think that's possible in principle. In assembler terms you can have a label attached to the function start and to each instruction, and you can express the difference between them as an assembler expression. Actually I think that means that you don't even need a relocation, the label difference can be calculated at assembly time.

If we wanted to lower the branch weight metadata into wasm custom section metadata at the LLVM level, then we'd need to make the section payload be something other than an LLVM string constant; it would have to be something that symbolically links to the function or instruction. I guess this must be possible somehow, as debug info does something like that; but it might be a special case. But we'd need something that would survive backend passes reordering the blocks and such.

It might make more sense to do something at IR lowering time instead, and attach the data somehow to the MachineInstr representing the branch. Actually come to think of it, other targets must do something like this too, right? Since the passes that use the branch weights would be backend passes like block placement. Maybe there's some common code we can use for that, that we can use at the end of the backend pipeline when we go from MachineInstr IR to MC.

sbc100 · 2025-06-11T18:54:53Z

This does sounds like something that would need to a relocation, at least for the function index.

As derek said the offset from the begining of the function can likely be calculated by the assembly so should not need a relocation (i.e. the linker cannot change this since it never touches the bytes of a function).

The current relocation we have are listed in llvm/include/llvm/BinaryFormat/WasmRelocs.def. What is the encoding that you need for the function index? We currently have R_WASM_FUNCTION_INDEX_LEB and R_WASM_FUNCTION_INDEX_I32 relocations types.

kripken · 2025-06-11T21:53:46Z

@dschuff

It might make more sense to do something at IR lowering time instead, and attach the data somehow to the MachineInstr representing the branch. Actually come to think of it, other targets must do something like this too, right? Since the passes that use the branch weights would be backend passes like block placement.

Makes sense. I got stuck finding where the IR metadata gets converted into the machine layer, but I did find how things work there: Machine*Passes use MachineBranchProbabilityInfoWrapperPass which provides MBPI->getEdgeProbability(source, target). I verified that works correctly in the last push to this branch: now we find br_if|unless MachineInstrs and print their probabilities.

@sbc100

What is the encoding that you need for the function index?

For the IR option here, we'd need the usual function index LEB, which I am guessing R_WASM_FUNCTION_INDEX_LEB is? The trickier part would be emitting the instruction offset relative to the function start, as @dschuff said, probably debug info does something like that, but I'm not sure how practical that is in general.

Overall it seems like we could do this either

At the IR level, emitting custom section metadata, if we can emit a relocatable offset of an instruction relative to the function start. Getting the weight inputs is in the previous state of the PR.
At the Machine level, if we can emit a custom section from there (if not maybe we'd need to shuttle the information around to the proper place?). Getting the weight inputs is in the current state of the PR.

I'll look into this some more, but any advice between these two very different paths would be welcome, given I'm not familiar with LLVM much.

dschuff · 2025-06-11T23:02:39Z

I think the latter would be most straightforward. If you look in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp you can see where we create custom sections for function attributes (EmitFunctionAttributes) and for the implementation of !wasm.custom_sections (in emitEndOfAsmFile). In this file you can use the MC API (which corresponds more or less directly to the features available in assembers), so to create a custom section you can basically switch to your custom section whenever you want (even during emission of the code section). So for example, during emitFunctionBodyStart, you can emit the relocatable LEB for the function index (and the header for the vector of metadata entries? I guess that would be easier if we had some way to know how many there are before emitting the function); then during emitInstruction you can emit an MCExpr for the offset.

You'll end up with asm code that switches back and forth between the code and metadata sections, emitting bytes incrementally to the latter (just as each instruction emits incrementally to the code section). The assembler will combine these into one section in the object file, and then the linker will combine them into one section in the output. We just need to figure out how get the header correct (i.e. the length of the vector of functions)?

The other option (instead of incrementally emitting the custom section during instruction emission) could be to emit the whole function's worth of the section at once at the end of the function (or even at the end of the object file). That could work if we can't know ahead of time how many annotations each function has, but I think we'd still need to make sure we emit labels for each branch (during instruction emission) to calculate the offsets, and we'd still need to figure out how to get things right after linking.

kripken · 2025-06-12T18:05:16Z

Thanks, I think I wrapped my head around some AsmPrinter basics, and got it to emit an empty branch hint section... next, the real 99%

kripken · 2025-06-20T21:34:34Z

Btw, the error you were getting before about the expression not being absolute was emitting the function symbol, not the label difference, right?

Yes, at some point I noticed that diff ended up as absolute, I guess since the two symbols are in the same section. So all the problem here is really just about the relocatable function symbol index.

dschuff · 2025-06-21T00:01:16Z

llvm/lib/MC/MCObjectStreamer.cpp

  if (Value->evaluateAsAbsolute(IntValue, getAssemblerPtr())) {
    emitULEB128IntValue(IntValue);
    return;
  }
-  insert(getContext().allocFragment<MCLEBFragment>(*Value, false));
+
+  if (!PadTo) {


To be even more specific here, the thing I was hoping for was that we could tell, based on some property of Value, whether there needs to be a relocation. If so, we can basically just assume that the size will be 5 (The only non-32-bit relocation I know of would be for memory addresses, so there could be i64.const instruction in code with 9-byte relocatable LEBs, but I guess those are handled differently elsewhere). And then we don't need the explicit PadTo in the streamer API or the separate assembler directive.
The question of course, is whether that's possible. MCExpr has evaluateAsRelocatable alongside evaluateAsAbsolute but currently I'm not totally sure what it means.

kripken · 2025-06-23T20:13:20Z

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp

@@ -609,6 +654,65 @@ void WebAssemblyAsmPrinter::emitJumpTableInfo() {
  // Nothing to do; jump tables are incorporated into the instruction stream.
 }

+std::optional<bool> WebAssemblyAsmPrinter::getBranchHint(const MachineInstr& MI) {


@aheejin This is the code I mentioned in the meeting, PTAL if you have time. Maybe there is a better way to do this?

I added some comments here and in the calling code that hopefully explain why it works as it does. At a high level:

During emitInstruction we see if we need a hint on a br_if. We do it at that time, so that we can emit a Label if we want a hint (we need the label to refer to the br_if from the custom section). That is after CFGStackify. At this stage we just emit the Labels and stash the information.

After emitting all functions, we gather the stashed information, and emit the final section.

I don't think there is a better way if we do this after CFGStackify, and this is roughly what I'd do as well. But this code may not handle try-catch and exceptions.

When EH is enabled, one of a BB's successor can be an unwind destination, so in theory a BB can have three successors. I don't think we generate that kind of code, but I think it is safe to check it and exclude it. (by calling MachineBasicBlock::isEHPad() or something)

And with try-catch the fallthrough block may not be the next block. For example:

try ... br_if ... catch ;; br_if falls through to NOT here ... end ;; br_if falls through to here

In this case both the true and false destinations will be successors but it may be tricky to figure out which one is which... To figure this out we may need to keep track of the control flow stack so we can compute the (true) destination of the branch? Which sounds like an overkill. This kind of fallthrough is generated by this optimization which may not be critical, meaning if someone wants performance they would've run binaryen anyway, so if this becomes a problem we can consider removing it.

Interesting, what does an unwind target from a BB that ends in a br_if mean? That there is a call in the middle of the BB, and it threw? (Do such calls not need to be terminators of BBs? Sorry, I don't know much about wasm backend control flow...)

Come to think of it, I think there can't really be a case with three successors (two branch successors and an unwind successor). Because, so when a BB has a call that can throw and has an unwind BB successor, before CFGStackify it would look like

... bb: ;; successor: ehpad, cont call $maythrow br $cont ehpad: catch ... cont: ...

And we remove that br $cont in CFGStackify because it is not necessary. In this case there's no chance that bb will end with br_if. Also this is not feasible:

bb: ;; successor: ehpad, somewhere, cont call $maythrow br_if $somewhere br $cont ehpad:

Because a BB with a throwing call (and has an unwind successor) wouldn't have a br_if after the throwing call.

What I can't guarantee we can't have is the case of try block's last BB that doesn't have an unwind dest BB. Because we can wrap many throwing calls within a big try-catch, the last BB within a try part may not be a BB with an unwind destination. For example:

bb0: ;; successor: ehpad, cont call $maythrow br $cont bb1: ;; successor: ehpad, cont call $maythrow br $cont bb2: ;; successor: somewhere, cont ... something else ... br_if $somewhere br $cont ehpad: ... cont:

In this case, if br $somewhere_else is removed, we don't have the info on bb2's fallthrough block. This situation is pretty contrived and it may not happen in reality. I just can't guarantee it won't. But given that this branch hint is afer all a 'hint', we may not need to consider this case. When seeing a br_if, just checking if the fallthrough (= next) block is an EH pad, and if so, not annotating that branch can be sufficient.

About

In this case, if br $somewhere_else is removed

Was this a typo of br $cont, or br_if $somewhere?

I was not aware about terminator lists until you wrote this comment... interesting.

PTAL at the commit where I just tried to handle the corner cases you mention.

About

In this case, if br $somewhere_else is removed

Was this a typo of br $cont, or br_if $somewhere?

Sorry, br $cont

I added some comments below. Please ignore my first comment (which I've now deleted); I was confused and the comment was incorrect.

aheejin · 2025-06-26T01:23:17Z

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp

+  // the parent block of this BR_IF. (If there are three successors, due to some
+  // terminator before us, give up on this hint.)


How about due to an additional unwind successor or ... EH pad successor?

aheejin · 2025-06-26T01:26:26Z

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp

+  // It must be one of the successors.
+  assert(FalseDest == MBB1 || FalseDest == MBB2);


This assertion does not hold if we fall through to an EH pad

aheejin · 2025-06-26T01:30:16Z

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp

+  // Stop if the fallthrough is an EH pad, because that could happen like this:
+  //
+  //  bb1:
+  //    ;; successor: if.true, otherwise
+  //    ..
+  //    br_if $if.true
+  //    br $otherwise
+  //
+  //  ehpad:
+  //
+  // If that br $otherwise is optimized out, it would look like we fall through
+  // to the ehpad (since it is the block physically after us), but in wasm's
+  // structured control flow that is not the case. Rather than try to find the
+  // true fallthrough, give up on a hint in this rare case.
+  if (FalseDest->isEHPad())
+    return {};


I think this can have another corner case too... For example

bb0: ... br_if bb1: delegate ehpad: catch

So this delegate here would have been inserted after we first place try-catch markers to fix unwind mismatches. We add delegate in a separate BB because it is also a terminator. In this case the fallthrough BB is not an EH pad but we still shouldn't fall through to the next BB.

I think a safer solution could be... Just bail out when one of the ParentBB's successor is not the next BB in the layout order. I think this can handle both cases.

Thanks, that is much simpler and makes perfect sense. The last commit includes that and other fixes from your comments.

Yeah this part looks good to me now (I haven't yet looked at other parts)

Don't look carefully at the other parts yet 😆 There are some bad hacks, and I am still debugging the case of multiple files...

aheejin · 2025-06-30T03:13:21Z

I haven't checked the code yet, but this PR was uploaded yesterday: llvm#146230

kripken · 2025-06-30T05:34:01Z

Thanks! Good to know...

I'll take a closer look Monday, but looks like that PR is in better shape than mine, none of the horrible hacks at least... so I imagine the LLVM side of my work won't be needed. Funny timing, I just got multi-file hinting working Friday and the first full benchmarks up and running 😆

kripken · 2025-06-30T20:46:19Z

That other PR definitely looks like the way to go in LLVM, compared to this, so I'll close this one...

Last update on my work in this area, the code here is hackish but fully functional, though it does need --no-gc-sections during link (to avoid emitting a branch hint referring to a function that was dce'd). With that, I could run real-world benchmarks. I found a few projects that looked relevant in the emscripten benchmark suite:

SQLite annotates likely() conditions, however, those are meant for the reader, not the optimizer.
Bullet has likely() conditions, but actually only in the SPU backend (Cell processor from the PS3 back in the day, which had no branch predictor, so it needed manual annotations of likelihood...)
Poppler. Has likely annotations in good places, and the final binary has 1,000 of them. However, the benchmark is no faster.

Somewhat disappointing, but I did confirm that making an artificial benchmark with a 30% benefit is not difficult: https://gist.github.com/kripken/09ad895764cd6cd3b7a8a6c13c643c93

I'll wait for that other PR to land in LLVM, and meanwhile am working on wasm-opt optimizer support for branch hints (properly updating the hint when we flip an if, for example).

Lukasdoe · 2025-07-01T08:52:20Z

Branch hint support (or support for code metadata in general) in wasm-opt would be great! Currently, I have to hard-disable wasm-opt and meta-dce in emscripten since (as you said) it invalidates branch hints.

I'll respond to your other points about the performance under the other PR. Sorry for noticing this PR here so late, it didn't appear in any searches :/

kripken · 2025-07-01T15:29:16Z

Sorry for noticing this PR here so late, it didn't appear in any searches :/

No worries, maybe I should have opened it on LLVM, but it began as more of a question/experiment...

Anyhow, what matters is we get good support in LLVM, and your PR looks right for that.

kripken added 15 commits June 10, 2025 11:48

naive start

552d29c

more

7294127

bad

80210db

Revert "bad"

7005833

This reverts commit 80210db.

Revert "more"

d989daf

This reverts commit 7294127.

Revert "naive start"

bf22519

This reverts commit 552d29c.

newPASS

678e2d5

work

7a715ca

work

9157b12

work

7a6be7a

work

ae889b4

work

88f4554

reading builds?

fd8359d

reading builds

d794e4b

debug

0b68888

using MachineBasicBlock

1d122a2

clean

817a8e9

emit a tiny empty branch hint section from AsmPrinter

5917102

kripken added 7 commits June 12, 2025 16:51

start to emit a hint

a7b1648

emit hint from emitInstruction

9d25bf3

gather for the end

9e8a8f7

fix

b1a29d2

clean

66ec0df

clean

05f533d

leb.all.the.things

698f1b4

horribler hack, but wasm tests finally pass

c708aa8

kripken added 2 commits June 20, 2025 14:37

simplify using emitAbsoluteSymbolDiffAsULEB128

4784fb4

awful attempts to find the proper mbb

dc305e3

dschuff reviewed Jun 21, 2025

View reviewed changes

kripken added 4 commits June 23, 2025 09:18

fix tests

3956ad6

clenaup

ee54625

typo

8b60261

cleanup

bb921ee

kripken commented Jun 23, 2025

View reviewed changes

kripken added 8 commits June 23, 2025 13:16

comments

96b80ce

wild and misguided effort to merge branch hint sections

31b8996

Handle some corner cases of wasm control flow

0599b04

Get MBB/Terminator code to compile

c492b05

almost get lld changes to compile

aee5b2a

lld compiles

6b32791

fix tests for padded num functions

e98fc21

start new tests

46650d9

aheejin reviewed Jun 26, 2025

View reviewed changes

kripken added 4 commits June 26, 2025 09:36

improve cfg analysis following feedback

0627b7c

DEBUG the bizarre problem

87f0a68

cleanup

899ba0a

multifile test passes

c87bcd1

kripken mentioned this pull request Jun 30, 2025

How is this enabled during compilation? WebAssembly/branch-hinting#37

Open

kripken closed this Jun 30, 2025

		// the parent block of this BR_IF. (If there are three successors, due to some
		// terminator before us, give up on this hint.)

		// It must be one of the successors.
		assert(FalseDest == MBB1 \|\| FalseDest == MBB2);

WebAssembly Branch Hinting? #1

WebAssembly Branch Hinting? #1

Uh oh!

Conversation

kripken commented Jun 11, 2025

Uh oh!

dschuff commented Jun 11, 2025

Uh oh!

sbc100 commented Jun 11, 2025

Uh oh!

kripken commented Jun 11, 2025

Uh oh!

dschuff commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Jun 12, 2025

Uh oh!

kripken commented Jun 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin commented Jun 30, 2025

Uh oh!

kripken commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Jun 30, 2025

Uh oh!

Lukasdoe commented Jul 1, 2025

Uh oh!

kripken commented Jul 1, 2025

Uh oh!

Uh oh!

dschuff commented Jun 11, 2025 •

edited

Loading

aheejin Jun 25, 2025 •

edited

Loading

kripken Jun 25, 2025 •

edited

Loading

aheejin Jun 26, 2025 •

edited

Loading

aheejin Jun 26, 2025 •

edited

Loading

kripken commented Jun 30, 2025 •

edited

Loading